About the Provider
OpenAI is the organization behind Whisper Large v3. OpenAI is a major AI research lab and platform provider that builds a wide range of generative models, including text, image, code, and audio models.Model Quickstart
This section helps you quickly get started with theopenai/whisper-large-v3 model on the Qubrid AI inferencing platform.
To use this model, you need:
- A valid Qubrid API key
- Access to the Qubrid inference API
- Basic knowledge of making API requests in your preferred language
openai/whisper-large-v3 model and receive responses based on your input prompts.
Below are example placeholders showing how the model can be accessed using different programming environments.You can choose the one that best fits your workflow.
Model Overview
Whisper Large V3 is a general-purpose speech recognition and speech translation model developed by OpenAI. It is designed for high-accuracy automatic speech recognition (ASR) across a wide range of languages, audio qualities, and recording conditions. The model is trained on more than 5 million hours of labeled and pseudo-labeled audio, enabling strong zero-shot performance across datasets and domains. Whisper Large V3 improves upon previous versions with better multilingual accuracy and enhanced audio representation.Model at a Glance
| Feature | Details |
|---|---|
| Model ID | openai/whisper-large-v3 |
| Provider | OpenAI |
| Model Type | Speech-to-Text (ASR) & Speech Translation |
| Architecture | Encoder-Decoder Transformer |
| Context Length | 30sec/chunk |
| Model Size | 1.55B params |
| Parameters | 8 |
Supported languages
| Code | Language | Code | Language |
|---|---|---|---|
| en | English | es | Spanish |
| fr | French | de | German |
| zh | Chinese | ja | Japanese |
| ko | Korean | ru | Russian |
| ar | Arabic | hi | Hindi |
| pt | Portuguese | it | Italian |
When to use?
You should consider using Whisper Large V3 if:- Transcription accuracy is more important than speed
- Your application requires support for many languages
- You work with noisy, low-quality, or challenging audio
- You need reliable speech recognition for long-form audio
- Your workflow includes speech translation or language identification
Inference Parameters
| Parameter Name | Type | Default | Description |
|---|---|---|---|
| Task | select | transcribe | Choose whether to transcribe to the original language or translate to English. |
| Language | select | en | Select the spoken language (auto-detect if unsure). |
| Temperature | number | 0 | Controls randomness of output — 0.0 means deterministic. |
| Initial Prompt | string | Business meeting conversation | Guides the model to better understand the audio context. |
| Word Timestamps | boolean | true | Return per-word timestamps for the transcription. |
| VadFilter | boolean | true | Enable for long pauses or background noise; disable for tightly trimmed clips to save compute. |
| Return Segments | boolean | true | Return transcription with time-segment metadata. |
| Output Format | select | json | Choose the transcription output format. |
Key Model Features
- Uses 128 Mel frequency bins in the spectrogram input (previous versions used 80)
- Trained on 1M hours of weakly labeled audio and 4M hours of pseudo-labeled audio
- Shows 10–20% error reduction compared to Whisper Large V2 across many languages
- Designed to handle noisy audio, varied recording conditions, and diverse accents
Supported Capabilities
- Multilingual speech recognition
- Speech translation
- Language identification
- Short-form and long-form transcription
Best Practices
- Use this model when accuracy is the top priority
- Process long audio using 30-second segments for optimal performance
- Prefer sequential long-form transcription for maximum accuracy
- Use chunked long-form transcription when faster processing is required
- Rely on this model for challenging audio conditions